We are using an example data set used in the Appendix by Shumway and Stoffer (2017). Additional examples for code using R to explore time series data is also found in Hyndman and Athanasopoulos (n.d.). For this analysis we use tidyverse Wickham et al. (2019), lubridate Grolemund and Wickham (2011), PerformanceAnalytics Peterson and Carl (2020), Hmisc Harrell Jr, Charles Dupont, and others. (2020), astsa Stoffer (2020), ggfortify Horikoshi and Tang (2018), tseries Trapletti and Hornik (2020), forecast Hyndman and Khandakar (2008), and gridExtra Auguie (2017).

Time series data can be thought of as data consistently measured or observed at different points in time. In this way, the data have an implicit order or flow, and questions about time series data often involve questions about change over time.

#Libraries
library(tidyverse)
library(lubridate)
library(PerformanceAnalytics)
library(Hmisc)
library(astsa)
library(ggfortify)
library(tseries)
library(forecast)
library(gridExtra)

1 Univariate Analysis

The AirPassenger dataset in the astsa package in R provides monthly totals of a US airline passengers, from 1949 to 1960. This data is already of a time series class therefore no further class or date manipulation is required.

data(AirPassengers)
AP <- AirPassengers
# Take a look at the class of the dataset AirPassengers
class(AP)
## [1] "ts"

The class object is ts which means it has inherent time structures. (Note that these time structures are not apparent with the head() command.)

# Take a look at the entries
AP
##      Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1949 112 118 132 129 121 135 148 148 136 119 104 118
## 1950 115 126 141 135 125 149 170 170 158 133 114 140
## 1951 145 150 178 163 172 178 199 199 184 162 146 166
## 1952 171 180 193 181 183 218 230 242 209 191 172 194
## 1953 196 196 236 235 229 243 264 272 237 211 180 201
## 1954 204 188 235 227 234 264 302 293 259 229 203 229
## 1955 242 233 267 269 270 315 364 347 312 274 237 278
## 1956 284 277 317 313 318 374 413 405 355 306 271 306
## 1957 315 301 356 348 355 422 465 467 404 347 305 336
## 1958 340 318 362 348 363 435 491 505 404 359 310 337
## 1959 360 342 406 396 420 472 548 559 463 407 362 405
## 1960 417 391 419 461 472 535 622 606 508 461 390 432
head(AP)
##      Jan Feb Mar Apr May Jun
## 1949 112 118 132 129 121 135

Initially, we want to check for a few key assumptions in the dataclass ts:

  1. No NA values
  2. The frequency is 12, one for each month
  3. The cycle (are there 12 months per year? Are they ordered correctly?)
# Check for missing values
sum(is.na(AP))
## [1] 0
# Check the frequency of the time series
frequency(AP)
## [1] 12
# Check the cycle of the time series
cycle(AP)
##      Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1949   1   2   3   4   5   6   7   8   9  10  11  12
## 1950   1   2   3   4   5   6   7   8   9  10  11  12
## 1951   1   2   3   4   5   6   7   8   9  10  11  12
## 1952   1   2   3   4   5   6   7   8   9  10  11  12
## 1953   1   2   3   4   5   6   7   8   9  10  11  12
## 1954   1   2   3   4   5   6   7   8   9  10  11  12
## 1955   1   2   3   4   5   6   7   8   9  10  11  12
## 1956   1   2   3   4   5   6   7   8   9  10  11  12
## 1957   1   2   3   4   5   6   7   8   9  10  11  12
## 1958   1   2   3   4   5   6   7   8   9  10  11  12
## 1959   1   2   3   4   5   6   7   8   9  10  11  12
## 1960   1   2   3   4   5   6   7   8   9  10  11  12

1.1 Visualize the Data: Basic Plots

Next we can visualize the data by simply plotting the ts object.

# Plot the raw data using the base plot function
plot(AP,xlab="Date", ylab = "Passenger numbers (Thousands)",main="Air Passenger numbers from 1949 to 1961")

As an alternative to the base plot function, so we can also use the extension ggfortify in R, which works with the ggplot2 package. This way, we avoid having to convert to a dataframe as required with ggplot2, but still having access to the layering grammar of graphics.

autoplot(AP) + labs(x ="Date", y = "Passenger numbers (Thousands)", title="Air Passengers from 1949 to 1961") + theme_bw()

These basic plots provide a lot of information about the nature of the time series, in particularly with respect to periodicity and trend.

First, we can see from the plot a repeating peak-trough shape, that seems to repeat every year or so. This is evidence of periodic change and perhaps seasonality.

Second, we can see that the amplitude (distance from trough to peak) increases as time goes on. This may be evidence for a nonstationary time series.

Third, we can see evidence of a general trend. That is, overall, the number of passengers has been increasing from 1950 to 1960, even over the periodic fluctations.

We can further investigate seasonality with a boxplot:

#Let’s use the boxplot function to see any seasonal effects.

boxplot(AP~cycle(AP),xlab="Date", ylab = "Passenger Numbers (1000's)" ,main ="Monthly Air Passengers Boxplot from 1949 to 1961")

From these exploratory plots, we can make some initial conclusions:

  • The passenger numbers increase over time with each year which may be indicative of an increasing linear trend, perhaps due to increasing demand for flight travel and commercialization of airlines in that time period.

  • In the boxplot there are more passengers travelling in months 6 to 9 with higher means and higher variances than the other months, indicating seasonality with a apparent cycle of 12 months. The rationale for this could be more people taking holidays and fly over the summer months in the US.

  • The air passengers data appears to be multiplicative time series as the passenger numbers increase, it appears so does the pattern of seasonality.

  • There do not appear to be any outliers and there are no missing values. Therefore no data cleaning is required.

1.2 Visualize the Data: Decomposition

Continuing to use ggfortify, we can further explore the multiplicative nature of the time series uisng the decompose(data,type) command.

#With this model, we will use the decompose function in R. Continuing to use ggfortify for plots, in one line, autoplot these decomposed #components to further analyze the data.

decomposeAP <- decompose(AP,"multiplicative")
autoplot(decomposeAP)

In these decomposed plots we can again see the trend and seasonality as inferred previously, but we can also observe the estimation of the random component depicted under the “remainder”.

1.3 Exploring Stationarity

A stationary time series has three conditions: the mean, the variance and the covariance of a time series must not be functions of time. Strict stationarity…

In order to fit ARIMA models, the time series is required to be stationary. We will use two methods to test the stationarity.

  1. Augmented Dickey-Fuller Test (ADF)
  2. Autocorrelation plots

In order to test the stationarity of the time series, let’s run the Augmented Dickey-Fuller Test using the adf.test function from the tseries R package.

For this test, the null hypothesis, \(H_0\) is that the time series is nonstationary, and the alternative hypothesis, \(H_A\) is that the time series is stationary (mean, variance, covariance are time independent).

adf.test(AP) 
## 
##  Augmented Dickey-Fuller Test
## 
## data:  AP
## Dickey-Fuller = -7.3186, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

If we set \(\alpha\) to 0.05, the results of the test suggest that time series is nonstationary. This assumption is supported by our initial visualizations.

We can also use autocorrelation to help assess if our time series is stationary.

The general thrust behind autocorrelation is the concept of a lag, or a delay between one point \(x_s\) and another point \(x_t\) in the same series. A lag of 1 means comparing a the original time series (\(x_1,x_2,...x_s\)) with its counter part (\(x_{1+1},x_{2+1},...,x_{s}\)). Note that the length of the string does change a forward or backward lag of any length (except 0). The correlation between the orginal time series and lagged counterparts form the basis for autocorrelation.

We will use autocorrelation function acf in from the base stats Rpackage. This function plots the correlation between a series and its lags with a 95% confidence interval in blue. If the autocorrelation value crosses the dashed blue line, it means that specific lag is significantly correlated with current series.

autoplot(acf(AP,plot=FALSE))+ labs(title="Correlogram of Air Passengers from 1949 to 1961")+
  theme_bw()

The maximum at lag 1 or 12 months, indicates a positive relationship with the 12 month cycle.

Since we have already created a decomposeAP list object with a random component, we can plot the acf of the decomposeAP$random which is equivalent to plotting the remainder plot in our initial decomposition.

# Review random time series for any missing values
decomposeAP$random 
##            Jan       Feb       Mar       Apr       May       Jun       Jul
## 1949        NA        NA        NA        NA        NA        NA 0.9516643
## 1950 0.9626030 1.0714668 1.0374474 1.0140476 0.9269030 0.9650406 0.9835566
## 1951 1.0138446 1.0640180 1.0918541 1.0176651 1.0515825 0.9460444 0.9474041
## 1952 1.0258814 1.0939696 1.0134734 0.9695596 0.9632673 1.0003735 0.9468562
## 1953 0.9976684 1.0151646 1.0604644 1.0802327 1.0413329 0.9718056 0.9551933
## 1954 0.9829785 0.9232032 1.0044417 0.9943899 1.0119479 0.9978740 1.0237753
## 1955 1.0154046 0.9888241 0.9775844 1.0015732 0.9878755 1.0039635 1.0385512
## 1956 1.0066157 0.9970250 0.9876248 0.9968224 0.9985644 1.0275560 1.0217685
## 1957 0.9937293 0.9649918 0.9881769 0.9867637 0.9924177 1.0328601 1.0261250
## 1958 0.9954212 0.9522762 0.9469115 0.9383993 0.9715785 1.0261340 1.0483841
## 1959 0.9825176 0.9505736 0.9785278 0.9746440 1.0177637 0.9968613 1.0373136
## 1960 1.0039279 0.9590794 0.8940857 1.0064948 1.0173588 1.0120790        NA
##            Aug       Sep       Oct       Nov       Dec
## 1949 0.9534014 1.0022198 1.0040278 1.0062701 1.0118119
## 1950 0.9733720 1.0225047 0.9721928 0.9389527 1.0067914
## 1951 0.9397599 0.9888637 0.9938809 1.0235337 1.0250824
## 1952 0.9931171 0.9746302 1.0046687 1.0202797 1.0115407
## 1953 0.9894989 0.9934337 1.0192680 1.0009392 0.9915039
## 1954 0.9845184 0.9881036 0.9927613 0.9995143 0.9908692
## 1955 0.9831117 1.0032501 1.0003084 0.9827720 1.0125535
## 1956 1.0004765 1.0008730 0.9835071 0.9932761 0.9894251
## 1957 1.0312668 1.0236147 1.0108432 1.0212995 1.0005263
## 1958 1.0789695 0.9856540 0.9977971 0.9802940 0.9405687
## 1959 1.0531001 0.9974447 1.0013371 1.0134608 0.9999192
## 1960        NA        NA        NA        NA        NA
# Autoplot the random time series from 7:138 which exclude the NA values
autoplot(acf(decomposeAP$random[7:138],plot=FALSE))+ labs(title="Correlogram of Air Passengers Random Component from 1949 to 1961") +theme_bw()

2 Extra Topics: Model Fitting

We can use the stationarity exploration to then fit models to the data.

2.1 Linear Model

Since there is an upwards trend we will look at a linear model first for comparison. We plot AirPassengers raw dataset with a blue linear model.

autoplot(AP) + geom_smooth(method="lm")+ labs(x ="Date", y = "Passenger numbers (1000's)", title="Air Passengers from 1949 to 1961")

This may not be the best model to fit as it doesn’t capture the seasonality and multiplicative effects over time.

2.2 ARIMA Model

Instead, we might try an Auto Regressive Integrated Moving Average. This is a form of regression that uses lags to explain variation.

As a simple introduction we can use the auto.arima function from the forecast R package to fit the best model and coefficients, given the default parameters including seasonality as TRUE. Note we have used the ARIMA modeling procedure as referenced..

arimaAP <- auto.arima(AP)
arimaAP
## Series: AP 
## ARIMA(2,1,1)(0,1,0)[12] 
## 
## Coefficients:
##          ar1     ar2      ma1
##       0.5960  0.2143  -0.9819
## s.e.  0.0888  0.0880   0.0292
## 
## sigma^2 estimated as 132.3:  log likelihood=-504.92
## AIC=1017.85   AICc=1018.17   BIC=1029.35

2.3 Forcasting

Finally we can plot a forecast of the time series using the forecast function, again from the forecast R package, with a 95% confidence interval where \(h\) is the forecast horizon periods in months.

forecastAP <- forecast(arimaAP, level = c(95), h = 36)
autoplot(forecastAP)

3 Multivariate Analysis

Sometimes we might have a collection of time series that we might want to interpret together. For example, we can work with three data sets from Shumway and Stoffer’s 2017 book Shumway and Stoffer (2017).

  1. The first data set is called cmort and captures the average weekly cardiovascular mortality in Los Angeles County from 1970 to 1980.

  2. The second data set is called part and captures the average weekly particulate matter concentration in Los Angeles County from 1970 to 1980.

  3. The third data set is called tempr and captures the average weekly temperature in Los Angeles County from 1970 to 1980.

We can instruct the R to read and format each data set using ts(data, start=(),frequency) arguments, where in the start we dictate the year and week of the data start, and the frequency specifies how many weeks (52) to consider. Using ts objects is very handy because we don’t have to store and manipulate dates (if they are regular interval), and visualization becomes easier.

## Load in and save data from astsa package
data(cmort)
cmort <- cmort
data(tempr)
tempr <- tempr
data(part)
part <- part
## Save as ts objects
cmort.ts <- ts(cmort, start = c(1970,1),frequency = 52)
part.ts <- ts(part, start = c(1970,1), frequency = 52)
tempr.ts <- ts(tempr, start=c(1970,1), frequency = 52)


The first thing we might want to do is simply visualize the data. Because the cmort,tempr and part time series are all on the same time scale (weekly, 1970-1980), we can plot them on the same plot.

#Set figure parameters
layout(matrix(c(1:3, 1:3), ncol=2), height=c(.5,.5, .5)) 
par(mar=c(.2,2,0,.5), oma=c(3.2,3,2.5,0.5), las=1)

#Particulate matter
plot(part.ts, type='n', xaxt='no', ylab='PM Conc.')
grid(lty=1, col=gray(.9))
lines(part.ts, col = "cadetblue3")

#Cardiovascular mortality
plot(cmort.ts, type='n', xaxt='no', ylab='Cardiac Mortality')
grid(lty=1, col=gray(.9))
lines(cmort.ts, col="orange3")

#Temperature
plot(tempr.ts, type='n', ylab="Temperature")
grid(lty=1, col=gray(.9))
lines(tempr.ts, col="forestgreen")

#Labels
mtext(side=1, "Year", line=2)
title(main = "Particulate Concentration, Cardiovascular Mortality, and Temperature", outer = T)

The first subplot (blue line) shows the weekly average particulate matter concentration. The next subplot (orange line) shows the weekly cardiovascular death count. The final line in green shows the average weekly temperature. From a descriptive stand point, a few things stand out about this plot:

  • All of these lines exhibit a periodicity or repetivite cycle of peaks and troughs.

  • The peaks in the particulate matter line (blue) and peaks in the cardiovascular mortality line (orange) are relatively similar – they seem to occur within close proximity of each other.

  • The green line (temperature) appears to be somewhat offset from the cardiovascular and particulate matter lines.

  • The amplitude appears to be decreasing for the cardiovascular mortality, and relatively constant for the other lines.

*Note the strong seasonal components in all of the series, corresponding to winter-summer variations and the downward trend in the cardiovascular mortality over the 10-year period (Shumway and Stoffer 2006, 2017).

We might want to start by simply visualizing the different time series together using a simple correlation measure for the different time series. The diagonal displays the histogram and kernel density estimates of the variables. The lower off-diagonal shows the bivariate scatterplots with a lowess fitted line, and the upper off-diagonal shows the Pearson’s \(R\) coefficient with significance levels (\(\cdot\) p < 0.1, ** p < 0.01, *** p < 0.001).

#Correlation, scatter plots, and bivariate regression
df <- cbind(Particulates=part,
            Mortality=cmort,
            Temperature=tempr)
chart.Correlation(df)

We might also consider looking for seasonality by either plotting weekly totals, or aggregating up to month.

## Weekly Aggregates
boxplot(cmort.ts~cycle(cmort.ts),xlab="Date", ylab = "Cardiovascular Mortality" ,main ="Weekly Cardiovascular Mortality Boxplot from 1949 to 1961")

boxplot(tempr.ts~cycle(tempr.ts),xlab="Date", ylab = "Temperature" ,main ="Weekly Temperature Boxplot from 1949 to 1961")

boxplot(part.ts~cycle(part.ts),xlab="Date", ylab = "Particulate Matter" ,main ="Weekly Particulate Matter Conc. Boxplot from 1949 to 1961")

We also might want to visualize the decomposition of the cardiovascular mortality time series– which, looks additve from our data.

cmort.decomp <- decompose(cmort.ts, "additive")
tempr.decomp <- decompose(tempr.ts,"additive")
autoplot(tempr.decomp)

autoplot(cmort.decomp)

We can also consider each time series separately by plotting the lag correlations (as below with the cardiovascular mortality data).

## Plotting correlation:
layout(matrix(c(1:3, 1:3), ncol=2), height=c(.5,.5, .5)) 
par(mar=c(.2,.5,1.5,.5), oma=c(3.2,3,1.5,0.5), las=1)
par(mfrow=c(3,3))
n <- seq(1,18,2)
for (i in n){
  plot(as.vector(cmort),lag(as.vector(cmort),n=i),
       xlab= "C.Mort.", ylab=paste("Lag",i),
       main = paste("Lag",i),xaxt='n',yaxt='n')
}
title(main = "Cardiovascular Mortality with Lags",outer = T)

We can also see this pattern confirmed in the acf plot (also called a correlogram). To measure dependence, we can use an autocorrelation function (ACF) in a time series model. ACF measures the linear predictability of the time series (\(x_1,x_2,...x_n\)) at time \(s\) using only its adjacent value at time \(t\) (Shumway and Stoffer 2006). This can be done by using a linear combination of inputs. It also ACF provides the ability to forecast the series at time \(s+1\) from the value of \(x\) at time \(t\) (\(x_s\)).

## Plotting ACF Cardiovascular Mortality
autoplot(acf(cmort.ts,plot=F, type = "correlation"))+labs(title="Correlogram of Cardiovascular Mortality, 1970-1980")+geom_vline(xintercept = .308,col="pink",lty=2) +theme_bw()

Recall from the univariate case that ACF can also help us assess stationarity. Prolonged negative or positive values indicate nonstationarity. These plots have a significant correlations that persist too long to be stationary.

## Plot temperature
g1 <- autoplot(acf(part.ts,plot=F, type = "correlation"))+labs(title="Correlogram of  Particular Matter, 1970-1980")+theme_bw()
## Plot particulate matter
g2 <- autoplot(acf(tempr.ts,plot=F, type = "correlation"))+labs(title="Correlogram of  Temperature, 1970-1980") +theme_bw()
grid.arrange(ncol=1,g1,g2)

We can also assess the relationship between two time series explicitly using the sample cross correlation. For some background, the sample CCF is the set of sample correlations between \(part_{t+h}\) and \(cmort_t\) for example, where h is the lag of (\(0, +/- 1, +/-2,...\)).

  • When \(h\) is negative than we interpret that as particulate matter leads cardiovascular mortality (for example)
  • When \(h\) is positive that we interpret that as particulate matter lags cardiovascular mortality (for example)

We can investigate this using ccf command in R.

pm_cmort.ccf <- ccf(part,cmort,type="correlation", plot = F)
autoplot(pm_cmort.ccf)+ggtitle("Cardiovascular Mortality and Particulate Matter CCF")

temp_cmort.ccf <- ccf(tempr,cmort,type="correlation", plot = F)
autoplot(temp_cmort.ccf)+ggtitle("Cardiovascular Mortality and Temperature CCF")

From these plots, it seems that peak particulate matter lead is around -0.1356. The peak temperatures (mean centered) is -.4433 which again suggests that temperature leads cardiovascular mortality.

Finally, we can learn a lot from simple scatterplots about relationship between cardiovascular mortality and temperature and/or particulate matter. The scatterplots indicate a possible nonlinear relation between mortality and the temperature, and a more linear pattern between mortality and particulate matter.

par(mfrow=c(1,2))
plot(tempr, cmort, xlab="Temperature", ylab="Mortality", main = "Mortality and \nTemperature")
lines(lowess(tempr, cmort),col="red")
plot(part,cmort,xlab="Particulate Matter Conc.",ylab = "Mortality", main = "Mortality and \n Particulate Matter")
lines(lowess(part,cmort),col="red")

We can further check the nonstationariy with the ADF test.

df <- cbind(Mortality = cmort.ts,PM = part.ts,Temp = tempr.ts)
#autoplot(df)+ggtitle("Time Series Cardiovascular Mortality, Temperature, PM Conc.")+theme(plot.title=element_text(hjust=0.5))
apply(df,2,adf.test)
## $Mortality
## 
##  Augmented Dickey-Fuller Test
## 
## data:  newX[, i]
## Dickey-Fuller = -5.4125, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary
## 
## 
## $PM
## 
##  Augmented Dickey-Fuller Test
## 
## data:  newX[, i]
## Dickey-Fuller = -4.493, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary
## 
## 
## $Temp
## 
##  Augmented Dickey-Fuller Test
## 
## data:  newX[, i]
## Dickey-Fuller = -4.4572, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary

4 Modeling Multivariate Time Series in R

Based on our exploratory data analysis, we have a good sense of what to expect in a model:

  • nonstationarity
  • square relationship with temperature and mortality
  • additivity

So we can model this:

#Center temperature to account for scaling
temp <- tempr-mean(tempr)
temp2 <- temp^2
trend <- time(cmort) # time index

#Linear regression fit
summary(fit <- lm(cmort~ trend + temp + temp2 + part, na.action=NULL))
## 
## Call:
## lm(formula = cmort ~ trend + temp + temp2 + part, na.action = NULL)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -19.0760  -4.2153  -0.4878   3.7435  29.2448 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.831e+03  1.996e+02   14.19  < 2e-16 ***
## trend       -1.396e+00  1.010e-01  -13.82  < 2e-16 ***
## temp        -4.725e-01  3.162e-02  -14.94  < 2e-16 ***
## temp2        2.259e-02  2.827e-03    7.99 9.26e-15 ***
## part         2.554e-01  1.886e-02   13.54  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.385 on 503 degrees of freedom
## Multiple R-squared:  0.5954, Adjusted R-squared:  0.5922 
## F-statistic:   185 on 4 and 503 DF,  p-value: < 2.2e-16
#ANOVA table (compare to next line)
summary(aov(fit)) 
##              Df Sum Sq Mean Sq F value Pr(>F)    
## trend         1  10667   10667  261.62 <2e-16 ***
## temp          1   8607    8607  211.09 <2e-16 ***
## temp2         1   3429    3429   84.09 <2e-16 ***
## part          1   7476    7476  183.36 <2e-16 ***
## Residuals   503  20508      41                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(aov(lm(cmort~cbind(trend, temp, temp2, part)))) 
##                                  Df Sum Sq Mean Sq F value Pr(>F)    
## cbind(trend, temp, temp2, part)   4  30178    7545     185 <2e-16 ***
## Residuals                       503  20508      41                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#AIC, BIC, AICc
num <- length(cmort) # sample size
AIC(fit)/num - log(2*pi) # AIC
## [1] 4.721732
BIC(fit)/num - log(2*pi) # BIC
## [1] 4.771699
(AICc <- log(sum(resid(fit)^2)/num) + (num+5)/(num-5-2)) #AiCc
## [1] 4.722062

We can further relate mean adjusted temperature and particulate levels to cardiovascular mortality.

#ACF
acf2(resid(fit), 52) # implies AR2

##      [,1] [,2] [,3] [,4]  [,5]  [,6]  [,7]  [,8] [,9] [,10] [,11] [,12] [,13]
## ACF  0.34 0.44 0.28 0.28  0.16  0.12  0.07  0.01 0.03 -0.05 -0.02  0.00 -0.04
## PACF 0.34 0.36 0.08 0.06 -0.05 -0.05 -0.02 -0.05 0.02 -0.06  0.00  0.06 -0.02
##      [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
## ACF  -0.02  0.01 -0.05 -0.01 -0.03 -0.06 -0.03 -0.03 -0.05  0.00 -0.02 -0.01
## PACF  0.00  0.04 -0.08  0.00 -0.01 -0.06  0.03  0.02 -0.03  0.05 -0.01  0.00
##      [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37]
## ACF  -0.05  0.03 -0.11 -0.02 -0.10 -0.02 -0.09 -0.03 -0.10 -0.08 -0.10 -0.07
## PACF -0.06  0.06 -0.11  0.00 -0.05  0.05 -0.04  0.04 -0.06 -0.06 -0.05  0.03
##      [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48] [,49]
## ACF  -0.07 -0.05 -0.05 -0.04 -0.03 -0.04  0.05  0.00  0.04  0.08  0.07  0.06
## PACF -0.02  0.02 -0.01  0.00  0.00 -0.01  0.08 -0.01 -0.01  0.08 -0.01 -0.01
##      [,50] [,51] [,52]
## ACF   0.05  0.07  0.02
## PACF -0.03  0.03 -0.04
sarima(cmort, 2,0,0, xreg=cbind(trend,temp,temp2,part))
## initial  value 1.849900 
## iter   2 value 1.733730
## iter   3 value 1.701063
## iter   4 value 1.682463
## iter   5 value 1.657377
## iter   6 value 1.652444
## iter   7 value 1.641726
## iter   8 value 1.635302
## iter   9 value 1.630848
## iter  10 value 1.629286
## iter  11 value 1.628731
## iter  12 value 1.628646
## iter  13 value 1.628634
## iter  14 value 1.628633
## iter  15 value 1.628632
## iter  16 value 1.628628
## iter  17 value 1.628627
## iter  18 value 1.628627
## iter  19 value 1.628626
## iter  20 value 1.628625
## iter  21 value 1.628622
## iter  22 value 1.628618
## iter  23 value 1.628614
## iter  24 value 1.628612
## iter  25 value 1.628611
## iter  26 value 1.628610
## iter  27 value 1.628610
## iter  28 value 1.628609
## iter  29 value 1.628609
## iter  30 value 1.628608
## iter  31 value 1.628608
## iter  32 value 1.628608
## iter  32 value 1.628608
## iter  32 value 1.628608
## final  value 1.628608 
## converged
## initial  value 1.630401 
## iter   2 value 1.630393
## iter   3 value 1.630382
## iter   4 value 1.630381
## iter   5 value 1.630375
## iter   6 value 1.630370
## iter   7 value 1.630362
## iter   8 value 1.630354
## iter   9 value 1.630349
## iter  10 value 1.630347
## iter  11 value 1.630347
## iter  12 value 1.630347
## iter  13 value 1.630347
## iter  14 value 1.630346
## iter  15 value 1.630346
## iter  16 value 1.630346
## iter  17 value 1.630346
## iter  17 value 1.630346
## iter  17 value 1.630346
## final  value 1.630346 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = xreg, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc, 
##     REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ar1     ar2  intercept    trend     temp   temp2    part
##       0.3848  0.4326  3075.1482  -1.5165  -0.0190  0.0154  0.1545
## s.e.  0.0436  0.0400   834.7286   0.4226   0.0495  0.0020  0.0272
## 
## sigma^2 estimated as 26.01:  log likelihood = -1549.04,  aic = 3114.07
## 
## $degrees_of_freedom
## [1] 501
## 
## $ttable
##            Estimate       SE t.value p.value
## ar1          0.3848   0.0436  8.8329  0.0000
## ar2          0.4326   0.0400 10.8062  0.0000
## intercept 3075.1482 834.7286  3.6840  0.0003
## trend       -1.5165   0.4226 -3.5881  0.0004
## temp        -0.0190   0.0495 -0.3837  0.7014
## temp2        0.0154   0.0020  7.6117  0.0000
## part         0.1545   0.0272  5.6803  0.0000
## 
## $AIC
## [1] 6.130066
## 
## $AICc
## [1] 6.130507
## 
## $BIC
## [1] 6.196687

We assume that the current value at time t is white noise temporarily. The sample ACF and partial ACF (PACF) of the residuals from the ordinary least squares fit of the model. The residual analysis output from sarima shows no obvious departure of the residuals from white noise.

References

Auguie, Baptiste. 2017. GridExtra: Miscellaneous Functions for "Grid" Graphics. https://CRAN.R-project.org/package=gridExtra.

Grolemund, Garrett, and Hadley Wickham. 2011. “Dates and Times Made Easy with lubridate.” Journal of Statistical Software 40 (3): 1–25. https://www.jstatsoft.org/v40/i03/.

Harrell Jr, Frank E, with contributions from Charles Dupont, and many others. 2020. Hmisc: Harrell Miscellaneous. https://CRAN.R-project.org/package=Hmisc.

Horikoshi, Masaaki, and Yuan Tang. 2018. Ggfortify: Data Visualization Tools for Statistical Analysis Results. https://CRAN.R-project.org/package=ggfortify.

Hyndman, Rob, and George Athanasopoulos. n.d. Forecasting: Principles and Practice (3rd Ed). 3rd ed. Accessed May 2, 2021. https://Otexts.com/fpp3/.

Hyndman, Rob J, and Yeasmin Khandakar. 2008. “Automatic Time Series Forecasting: The Forecast Package for R.” Journal of Statistical Software 26 (3): 1–22. https://www.jstatsoft.org/article/view/v027i03.

Peterson, Brian G., and Peter Carl. 2020. PerformanceAnalytics: Econometric Tools for Performance and Risk Analysis. https://CRAN.R-project.org/package=PerformanceAnalytics.

Shumway, Robert H., and David S. Stoffer. 2006. Time Series Analysis and Its Applications: With R Examples. 2nd [updated] ed. Springer Texts in Statistics. New York: Springer.

———. 2017. Time Series Analysis and Its Applications: With R Examples. Springer Texts in Statistics. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-52452-8.

Stoffer, David. 2020. Astsa: Applied Statistical Time Series Analysis. https://CRAN.R-project.org/package=astsa.

Trapletti, Adrian, and Kurt Hornik. 2020. Tseries: Time Series Analysis and Computational Finance. https://CRAN.R-project.org/package=tseries.

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.